Baum-welch Training for Segment-based Speech Recognition

نویسندگان

  • Han Shu
  • I. Lee
  • James Glass
چکیده

The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm for segment-based speech recognition which addresses these issues by an innovative use of finite-state transducers. This procedure has the desirable property of not requiring initial seed models that were needed by the Viterbi training procedure we have used previously. On the PhoneBook telephone-based corpus of read, isolated words, the Baum-Welch training algorithm obtained a relative error reduction of 37% on the training set and a relative error reduction of 5% on the test set, compared to Viterbi trained models. When combined with a duration model, and more flexible segmentation network, the Baum-Welch trained models obtain an overall word error rate of 7.6%, which is the best result we have seen published for the 8,000 word task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Study of the Baum-Welch and Viterbi Training Algorithms Applied to Read and Spontaneous Speech Recognition

In this paper we compare the performance of acoustic HMMs obtained through Viterbi training with that of acoustic HMMs obtained through the Baum-Welch algorithm. We present recognition results for discrete and continuous HMMs, for read and spontaneous speech databases, acquired at 8 and 16 kHz. We also present results for a combination of Viterbi and Baum-Welch training, intended as a trade-off...

متن کامل

Optimization-Based Control for the Extended Baum-Welch Algorithm

The extended Baum-Welch (EBW) is the most popular algorithm for discriminative training of speech recognition acoustic models. The EBW algorithm is usually controlled with heuristic rules, which are used to determine the smoothing parameters of the algorithm. In this paper we propose a control method for EBW which is based on the optimization of an error measure over a small control set. The la...

متن کامل

Segmentation of speech using speaker identification

This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker seg-mentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentatio...

متن کامل

Efficient ML training of CDHMM parameters based on prior evolution, posterior intervention and feedback

We present an efficient maximum likelihood (ML) training procedure for Gaussian mixture continuous density hidden Markov model (CDHMM) parameters. This procedure is proposed using the concept of approximate prior evolution, posterior intervention and feedback (PEPIF). In a series of experiments for training CDHMMs for a continuous Mandarin Chinese speech recognition task, the new PEPIF procedur...

متن کامل

A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition

While Maximum Entropy (ME) based learning procedures have been successfully applied to text based natural language processing, there are only little investigations on using ME for acoustic modeling in automatic speech recognition. In this paper we show that the well known Generalized Iterative Scaling (GIS) algorithm can be used as an alternative method to discriminatively train the parameters ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003